teddy bear
After a teddy bear talked about kink, AI watchdogs are warning parents against smart toys
'Children could become attached to a bot rather than a person or imaginary friend, which could hurt their development.' 'Children could become attached to a bot rather than a person or imaginary friend, which could hurt their development.' Advocates are fighting against the $16.7bn global smart-toy market, decrying surveillance and a lack of regulation As the holiday season looms into view with Black Friday, one category on people's gift lists is causing increasing concern: products with artificial intelligence. The development has raised new concerns about the dangers smart toys could pose to children, as consumer advocacy groups say AI could harm kids' safety and development. The trend has prompted calls for increased testing of such products and governmental oversight.
- Europe > Ukraine (0.06)
- Oceania > Australia (0.05)
- North America > United States > Texas > Travis County > Austin (0.05)
- (2 more...)
- Law (1.00)
- Leisure & Entertainment > Sports (0.71)
- Government > Regional Government > North America Government > United States Government (0.31)
Supplementary Material for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Anonymous Author(s) Affiliation Address email A Implementation Details 1
Table 1: The prepending instructions provided to GPT -3.5/4 during our LayoutGPT's 2D and 3D T ask Instruction for GPT -3.5/4 2D Layout Planning Instruction: Given a sentence prompt that will be used to generate an image, plan the layout of the image. Formally, each line should be like "object {width:?px; height:?px; left:?px; top:?px; }". Formally, each line should follow the template: FURNITURE {length:?px:
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
8caa38721906c1a0bb95c80fab33a893-Supplemental.pdf
V100 GPUs to train the models. Consortium and are licensed under a Creative Commons Attribution 4.0 License. Similarly, for evaluating the agent listener with a human speaker, each agent evaluates 400 human utterances in Fig 5b. In Fig 10, we present the results of the human evaluation on the text game. Sec 4.3, we show that agents trained using our method beat all prior baselines when paired with both The blue bars show the standard deviation across all agents present in the buffer.
Supplementary Material for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Anonymous Author(s) Affiliation Address email A Implementation Details 1
Table 1: The prepending instructions provided to GPT -3.5/4 during our LayoutGPT's 2D and 3D T ask Instruction for GPT -3.5/4 2D Layout Planning Instruction: Given a sentence prompt that will be used to generate an image, plan the layout of the image. Formally, each line should be like "object {width:?px; height:?px; left:?px; top:?px; }". Formally, each line should follow the template: FURNITURE {length:?px:
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
8caa38721906c1a0bb95c80fab33a893-Supplemental.pdf
V100 GPUs to train the models. Consortium and are licensed under a Creative Commons Attribution 4.0 License. Similarly, for evaluating the agent listener with a human speaker, each agent evaluates 400 human utterances in Fig 5b. In Fig 10, we present the results of the human evaluation on the text game. Sec 4.3, we show that agents trained using our method beat all prior baselines when paired with both The blue bars show the standard deviation across all agents present in the buffer.
When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models
Choi, Dasol, Lee, Jihwan, Lee, Minjae, Kahng, Minsuk
While prior research on text-to-image generation has predominantly focused on biases in human depictions, we investigate a more subtle yet pervasive phenomenon: demographic bias in generated objects (e.g., cars). We introduce SODA ( Stereotyped O bject D iagnostic A udit), a novel framework for systematically measuring such biases. Our approach compares visual attributes of objects generated with demographic cues (e.g., "for young people") to those from neutral prompts, across 2,700 images produced by three state-of-the-art models (GPT Image-1, Imagen 4, and Stable Diffusion) in five object categories. Through a comprehensive analysis, we uncover strong associations between specific demographic groups and visual attributes, such as recurring color patterns prompted by gender or ethnicity cues. These patterns reflect and reinforce not only well-known stereotypes but also more subtle and unintuitive biases. We also observe that some models generate less diverse outputs, which in turn amplifies the visual disparities compared to neutral prompts. Our proposed auditing framework offers a practical approach for testing, revealing how stereotypes still remain embedded in today's generative models. We see this as an essential step toward more systematic and responsible AI development.
SPARK: Graph-Based Online Semantic Integration System for Robot Task Planning
Shirasaka, Mimo, Ikeda, Yuya, Matsushima, Tatsuya, Matsuo, Yutaka, Iwasawa, Yusuke
The ability to update information acquired through various means online during task execution is crucial for a general-purpose service robot. This information includes geometric and semantic data. While SLAM handles geometric updates on 2D maps or 3D point clouds, online updates of semantic information remain unexplored. We attribute the challenge to the online scene graph representation, for its utility and scalability. Building on previous works regarding offline scene graph representations, we study online graph representations of semantic information in this work. We introduce SPARK: Spatial Perception and Robot Knowledge Integration. This framework extracts semantic information from environment-embedded cues and updates the scene graph accordingly, which is then used for subsequent task planning. We demonstrate that graph representations of spatial relationships enhance the robot system's ability to perform tasks in dynamic environments and adapt to unconventional spatial cues, like gestures.
ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge
Walczak, Mikolaj, Kallakuri, Uttej, Mohsenin, Tinoosh
Autonomous systems deployed on edge devices face significant challenges, including resource constraints, real-time processing demands, and adapting to dynamic environments. This work introduces ATLASv2, a novel system that integrates a fine-tuned TinyLLM, real-time object detection, and efficient path planning to enable hierarchical, multi-task navigation and manipulation all on the edge device, Jetson Nano. ATLASv2 dynamically expands its navigable landmarks by detecting and localizing objects in the environment which are saved to its internal knowledge base to be used for future task execution. We evaluate ATLASv2 in real-world environments, including a handcrafted home and office setting constructed with diverse objects and landmarks. Results show that ATLASv2 effectively interprets natural language instructions, decomposes them into low-level actions, and executes tasks with high success rates. By leveraging generative AI in a fully on-board framework, ATLASv2 achieves optimized resource utilization with minimal prompting latency and power consumption, bridging the gap between simulated environments and real-world applications.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.81)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.49)
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Cao, Yue, Liu, Yangzhou, Gao, Zhangwei, Zhu, Jinguo, Zhu, Xizhou, Lu, Lewei, Qiao, Yu, Dai, Jifeng
Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance. To address this, we introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs. Specifically, (1) on the data side, we design an automated preference data construction pipeline to create MMPR, a high-quality, large-scale multimodal reasoning preference dataset. and (2) on the model side, we explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance. Our approach demonstrates improved performance across multiple benchmarks, particularly in multimodal reasoning tasks. Notably, our model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B. We hope this study could inspire further advancements in MLLMs. Code, data, and model shall be publicly released.
- South America > Brazil (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Latvia (0.04)
- (10 more...)
- Health & Medicine (0.93)
- Education > Educational Setting > K-12 Education (0.92)